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Abstract. This paper discusses the family of distributions on the Grassman- 
nian G(m, r) of the linear span of r central normal vectors in K m or C m , 
parametrized by the covariance matrix (up to a positive factor). Our main 
result is an existence and uniqueness criterion for the maximum likelihood es- 
timate of a sample in G(m, r), based on convexity and asymptotic properties 
of the log-likelihood. By coupling methods of algebraic geometry and linear 
programming, we show that almost all samples of size n > m 2 /r(m — r) in 
G(m, r) have a unique MLE. 

In the real case, a new, unexpected phenomenon takes place for some values 
1 < r < m, which does not occur in the angular Gaussian case r = 1. Random 
samples of some critical size in G(m, r) may have a unique estimate or not, 
with a positive probability in either case. 



1. Introduction 

As stated in [20], the current data deluge inundating science is remarkable for 
the rapid proliferation in new data type. Typical examples are directions in R™ or 
elements of the Grassmann manifold G(m, r) of all vector subspaces of dimension r 
of M." 1 (0 < r < m), as introduced in [4]. Being of increasing importance in practical 
situations (see e.g. [7], [H], [15], [18], [19], [20] or [21]), there is a strong need for 
studying various classical inference problems, like for example maximum likelihood 
estimation. To deal with these problems, one can in most cases reparametrize the 
manifold and recast the inference problem in some Euclidean space. However, this 
can have the effect of hiding intrinsic geometric properties of the statistical relevant 
objects (see below). 

A typical example is obtained when dealing with G(m,r) when r = 1, the set 
of axes or directions in R m , see e.g. [T2] and [21] ■ In [T3], the manifold is endowed 
with the angular Gaussian distribution, that is of the law of the random direction 
obtained by retaining only the axis of a multivariate centered gaussian random vec- 
tor in R m of covariance matrix E. Kent and Tyler |13] derived sufficient conditions 
for the existence of the maximum likelihood estimator (MLE) based on an i.i.d. 
sample by working on R m ; the angular Gaussian distribution is then equivalent 
to the Cauchy law. The mathematical analysis can then be performed in M m_1 , 
at the cost of loosing nice properties of the problem. In [2], the whole picture was 
obtained using mainly convexity. The parameter space Pos(to) consists of positive 
definite self adjoint matrices of determinant 1, which is considered as a Riemanian 
manifold with a natural metric. The results derived in [2j make strong use of this 
manifold structure, of the particular form of the log likelihood function and of the 
geometric link between the parameter space Pos(m) and the sample space G(m, r), 
r = 1. Interestingly, the estimated scatter matrix plays a fundamental role for 
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multivariate nonparametric tests, where it is known as the Tyler's transformation 
matrix, see e.g. [17j . or in finance where the maximum likelihood estimator is used 
to fit financial data, see [3J- 

When r is arbitrary, we obtain random subspaces by retaining only the linear 
span U —< Xi,--- ,x r > of an i.i.d. sample of r multivariate centered gaussian 
random vectors of covariance matrix E G Pos(m). The law of this random subspace 
has been considered previously in the literature and has been termed as the matrix 
angular Gaussian distribution (see e.g. [3], [5] or [5]); however, basic questions like 
the existence of the MLE remain unexplored. 

We will show that a new phenomenon emerges: In most statistical settings, the 
MLE based on some sample u\, ■ ■ ■ ,u n exists with probability one when the size 
n is larger that a critical value n c and does not exist with probability one when 
n < n c , like for example in the angular Gaussian case with r = 1 (see e.g. [2]). In 
the Grassmannian setting, we show in Example 2 of Section 3 that there are sizes n 
such that the MLE exists with positive probability and does not exist with positive 
probability (see e.g. pQ where a similar phenomenon occurs in logistic regression). 

Section 2 introduces the Grassmannian statistical model and the related likeli- 
hood function. Section 3 considers the problem of existence and uniqueness of the 
Grassmannian maximum likelihood estimate (GE). Our main results, Theorems 1 
and 2 give necessary and sufficient conditions for the existence of a unique GE. The 
geometrical setting is illustrated in Examples 1 and 2. Section 4 provides funda- 
mental properties of the likelihood function like its convexity when restricted to the 
geodesies of Pos(m). This nice property is then used to prove Theorems 1 and 2. 
Theorem 4 of Section 5 shows finally that the GE of almost all samples of size n is 
unique when 



r(m — r) 



2. The Grassmannian statistical model 

2.1. Grassmannian distributions. We present two versions of the Grassmannian 
model, real or complex. To treat them in parallel, we set F = R or C, and denote by 
A* eF sxr the adjoint of a matrix ieF rxs , i.e., the transpose of X if F = R and 
the complex conjugate of the transpose of A if F = C. A square matrix E G flr mxm 
is self-adjoint when E = E*, i.e., symmetric if F = R and Hcrmitian if F = C. 

Let Xi, . . . ,x r G F m be i.i.d. random vectors in F m with central normal distribu- 
tion of positive definite self-adjoint covariance matrix E. The density of the normal 
law is exp(x*E _1 x/2) (x G F m ) up to a constant factor in both the real and the 
complex case. We define the Grassmannian distribution of parameter E as the law 
of the linear span (xi, . . . , x r ) of these vectors in F m . It is a Borel probability mea- 
sure Gn on the Grassmann manifold G(m, r) of all vector subspaces of dimension r 
of F m (0 < r < m) . The parameter E of a Grassmannian distribution is defined 
up to a a positive factor only. We remove this indeterminacy by requiring the de- 
terminant of E to be 1. So, we parametrize the Grassmannian distributions by the 
space Pos(m) of positive definite self-adjoint matrices E G flr mxm f determinant 1. 

Given a regular matrix A G F" 1 **" 1 , the random vectors Ax\, . . . ,Ax r are i.i.d. 
with central normal law of covariance matrix AT, A* . Hence, the image measure 
of under the transformation of G(m, r) given by AU — {Ax \ x G U} for 
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U £ G(m,r) is 

(1) AGv = Ga*a*- 

In fact, the Grassmannian statistical model (0s)sep os (m) is the unique family of 
B orel probability measures on G(m, r) indexed by Pos(m) enjoying the equivariance 
property §Q for all matrices A £ p mxm j determinant 1. To see this, observe that 
condition ([1]) implies the invariance of Gt, under the group of invertible matrices A of 
determinant 1 such that AY, A* = E. As this group is compact and acts continuously 
and transitively on G(m, r), there is a unique Borel probability measure on G(m, r) 
which is invariant under it, namely Gt.- 

Let us represent a point U £ G(m,r) as the linear span U = (x\, . . . ,x r ) of 
linearly independent vectors Xi,...,x r of U or, equivalently, as the range U — 
(X) of the matrix X = (x±, . . . , x r ) of rank r. Then, a computation shows that 
the density, or Radon-Nikodym derivative, of the Grassmannian distribution Gn 
(E £ Pos(m)) with respect to the uniform distribution Qi on G(m,r) (I = identity 
matrix) is given by 

(2) ~dS; {{X)) -{det(X*^X)) 

where iw — dimR(F) (see [J] for the real case). The meaning of this formula is 
perhaps more apparent in the form 

^7 (t/)= U(g S nt/) J ^ G(m,r)), 

where — {x 6 F™ 1 | x*Y.~ 1 x < 1} denotes the ellipsoid associated to £ (£j — 
unit ball), and vol the Lebesgue measure on U . 

When r = 1, the Grassmannian distribution Q-^ is known as the (real or complex) 
angular Gaussian distribution of parameter £ S Pos(m) on the projective space 
P m_1 = G(to, 1) (see [2]). For any < r < m, the Grassmann manifold G(m,r) 
can be viewed as the space of projective subspaces of dimension r — 1 of P m_1 by 
identifying a vector r-subspace U of F m with the projective subspace {y £ P m_1 | 
y C C/}. In this projective interpretation, the Grassmannian distribution Qy, on 
G(m, r) is the law of the projective span of i.i.d. random points yi, . . . , y r of P m_1 
with angular Gaussian distribution of parameter E . 

2.2. Grassmannian maximum likelihood estimates. Let P be a Borel proba- 
bility measure on G(m, r). Typically, we think of P as being the empirical measure 
(Su 1 + ■ ■ ■ + Su n )/n of a sample U\, . . . , U n in G(m, r), but other cases are of inter- 
est too. A parameter E £ Pos(m) is called a Grassmannian (maximum likelihood) 
estimate — abbreviated GE in the sequel — of P if it maximizes the log-likelihood 
i*G(m r) l°s(dGE/dGi) dP . It is called a GE of a sample U\, . . . , U n £ G(m, r) when 
P is the empirical measure (Su t + • • • + 6u n )/n. 

For convenience, we shall rather work with the following negative version of the 
log-likelihood 

(3) ME) = -— / lag{dGs/dGi)dP = [ iu{Y)dP{U), 

lyrn J G (m,r) JG(m,r) 

where the (negative) log-density £jj is defined by 
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With this notation, a GE of P minimizes lp. 

3. Existence and uniqueness of the Grassmannian estimate 

Theorem 1. A Borel probability measure P on the real or complex Grassmannian 
G(m,r) has a unique GE if and only if 

(5) / dim(£7 n V) dP(U) < — dim(V) 

JG(m,r) m 

for all nontrivial linear subspaces V of¥ m (0 =/= V ^ ¥ m ). 

In the case of an empirical measure P = (S^ + • • • + 5u n )/n, 

Corollary 1. A sample U±, ...,£/„ in the real or complex Grassmannian G(m,r) 
has a unique GE if and only if 

i n 

(6) - V dim(Ui r\V)<— dim(V) 

n m 
i=i 

for all nontrivial linear subspaces V of¥ m (0 =/= V ^ ¥ m ). 

The proof of the theorem will be presented in the next section. Let us first 
consider some special cases. 

Example 1. When r = 1, G(m, r) is the projective space P m_1 , and the Grass- 
mannian distributions are known as angular Gaussian distributions. In this case, 
dim(£/fcfW) = 1 or in Corollary Q] according to whether Uk Q V or not. Hence, the 
necessary and sufficient condition for a sample of size n in p ln_1 to have a unique 
angular Gaussian maximum likelihood estimate is that the number of points of the 
sample contained in a nontrivial vector subspace V of¥ m be less than ndim(V)/m 
(see [2] for a more precise result). 

Now, almost all samples in p m_1 are in general position, ie., any nontrivial vector 
subspace V of F m contains at most dim(V) points of the sample. Thus almost all 
samples of size n > m in p m_1 have a unique angular Gaussian maximum likelihood 
estimate. This result goes back to [23] . On the other hand, no samples of size n < m 
in P m_1 have a unique angular Gaussian maximum likelihood estimate since any 
point U G p m_1 of a sample is, of course, contained in the one-dimensional subspace 
V = U of F m , so that the condition for the number of points of the sample contained 
in V to be less than ndim(V)/m is not satisfied when n < m. 

For a Grassmannian G(m,r) which is not a projective space, the situation is 
more involved, even in the simplest case m — 4, r = 2. 

Example 2. Let U\, . . . , U n be a sample in the Grassmann manifold G(4, 2), viewed 
as the space of lines in the projective space P 3 . Suppose that the lines U\, . . . ,U n 
are pairwise skew, i.e., Ui R Uj = for i ^ j. Examining case by case all of the 
possible values of dim(V) and dim(f/, (~l V) in Corollary [TJ we find that the sample 
has a unique GE if and only if n > k, where k is the maximum number of lines of 
the sample all of which are met by some line V S G(4, 2). Now, given a line V, we 
can choose any number n of pairwise skew lines Ui, . . . , U n S G(4, 2) meeting V, 
so that n = k. Hence, there are arbitrary large samples of pairwise skew lines not 
having a unique GE. What is needed is a bound for k. 
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Recall that the lines meeting each of three pairwise skew lines Ui, Ui and C/3 
form a one-dimensional family 5i of lines on a quadric surface Q C P 3 , whereas 
the other family 5F 2 of lines on Q consists of the lines meeting every line of 5i . A 
point of intersection x of a further line E/4 G G(4, 2) with the quadric Q determines 
a line meeting each of the four lines Ui, U2, U3 and C/4, namely the line V G 3r 
through x, and vice versa (see Fig. [T]). 



The number of lines meeting four pairwise skew lines U±, U2, C/3 and C/4 is thus 
o 2 if C/4 meets Q transversally, 

o if C/4 does not meet Q, which can occur only when F = R, 
o 1 if C/4 is tangent to Q, 

o infinite if C/4 lies on Q 7 in which case C/4 G Ji so that every line meeting 
Ut, U% and C/3 necessarily meets C/4 too. 

In both the real and the complex case, there are at most two lines V G G(4, 2) 
meeting each of four pairwise skew lines, except when the four lines belong to the 
same family of lines on a smooth quadric. So, almost all samples of size n in 
G(4, 2) consist of pairwise skew lines of which at most four are intersected by a line 
V G G(4, 2). We conclude from the criterion above that almost all samples of size 
n > 4 in the real or complex Grassmann manifold G(4, 2) have a unique GE. 

In the complex case, there is a line meeting each of any four pairwise skew lines 
C/i, U2, C/3 and C/4 since C/4 always meets the quadric Q. The same holds if some 
of the four lines meet together or even coincide. Thus, by Corollary [TJ no samples 
of size n < 4 in the complex Grassmann manifold G(4, 2) have a unique GE. 

The situation is different in the real case since C/4 need not meet the quadric Q. 
If we choose four lines at random, there may be a line meeting each of them or not, 
with a positive probability in both cases. Therefore, the probability that a random 
sample of size n — 4 in the real Grassmann manifold G(4,2) has a unique GE is 
positive and < 1. 

On the other hand, by Corollary [TJ no samples of size n < 4 in the real Grass- 
mann manifold G(4, 2) have a unique GE since any n < 4 lines are intersected by 
some line (in fact, by infinitely many lines). 



We first introduce notions from linear algebra which are necessary to settle the 
likelihood equation on the symmetric space Pos(m). Consider the scalar product 




Figure 1 . Two lines meeting each of four lines 



4. Likelihood equation 



(7) 



(x\y) s =x*^ 1 y (x,ye¥ m ) 
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associated to a parameter E G Pos(m). We denote by 7T[/(S) the E- orthogonal 
projector onto a vector subspace U of F m . It is the linear map 7r;y(E) '■ F m — * F m 
defined by nu(Y,)u = u if u G U, and ttu(T,)v = if v G F" 1 is E- orthogonal to [/, 
i.e., = for all u £ U. In matrix notation, 

(8) 707 (E) = X(X*E~ 1 X)" 1 X*E" 1 , 

where J7 = (X) is the range of X G F mxr . We call a matrix A G F mxm self-E- 
adjoint if (^4x|y)s = (x\Ay)s for all x,y € F m or, equivalently, if it coincides with 
its E- adjoint E.A*E _1 . 

The parameter space Pos(m) is a Riemannian manifold, in fact a symmetric 
space. Its tangent space Ts at E G Pos(m) consists of the self-E-adjoint matrices 
v G F mxm of trace zero, and the Riemannian metric is defined by the scalar products 

(9) (vi,v 2 ) = tr(uiva) («i,«2GT s ) 

on the tangent spaces Ts, where tr(A) denotes the trace of a matrix A. The 
geodesic 7 : M — > Pos(m) of velocity v G Ts issuing from E G Pos(m) is 

(10) 7 (t) = e tl 'E(e t, ')* = e 2to E (t G M), 

where e denotes the matrix exponential. 

Deriving the expression (j4|) along a geodesic (|T0|) and using the matrix form (jHJ) 
of the E-orthogonal projector 7T[/(E) onto U, we find the gradient (with respect to 
the Riemannian metric ((9]) defined above) of the log-density 

(11) grad£c/(E) = — /-7r [/ (E) (U G G(m, r), E G Pos(m)), 

m 

and the covariant derivative of grad ly in the direction of v G Ts 

(12) V, grad^(E) = nu(T,)v(I - ^(E)) + (/ - 7r c/ (E))w c/ (E). 

By integrating these formulas with respect to P, and interchanging integration and 
derivation by means of the Lebesgue dominated convergence theorem, we get the 
gradient of the log-likelihood ([3]) 

(13) gradME) = — I - [ ^(E) dP(U) (E G Pos(m)), 

m JG{m,r) 

and its covariant derivative 

(14) V„gradME) - / [7rc/(E)w(J - 7rc;(E)) + (7 - 7r C /(E)>7r [/ (E)]dP([/). 

JG{m,r) 

A function / on Pos(m) is called convex if its restriction /(7(A)) (f G M) to any 
geodesic 7 is convex in the usual sense. This amounts to saying that the Hessian V 2 / 
is positive semi-definite, i.e., V 2 /(E) = (V„ grad /(E), v) > for all E G Pos(m) 
and v G Ts, since 

(15) |*/Wt)) = = (V ug rad/( 7 (t)),«>, 
where v is the velocity of the geodesic 7. 

Proposition 1. The log-likelihood function lp is convex. More precisely, its re- 
striction £p("f{t)) (t 6lj to a geodesic 7 is either strictly convex or affine linear. 
The latter case occurs if and only if v{U) C U for P -almost all U G G(m, r), where 
v is the velocity of the geodesic. 
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Proof. The convexity can be obtained directly by proceeding as in [2j . On the other 
hand, one can use the fact that the log-likelihood function is a Busemann function 
for the symmetric space Pos(m) (see e.g. [S]), and convexity follows. □ 

As the log-likelihood function lp is convex, its minima are exactly the zeroes of 
its gradient hence, by formula (fl"5|) . 

Theorem 2. A parameter £ S Pos(m) is a GE of a Borel probability measure P 
on G(m, r) if and only if it satisfies the maximum likelihood equation 

(16) / nu(E)dP(U) = -I. 



G(m,r) m 



Proof of Theorem [JJ 



One can either proceed as in [2J, or use the fact that the log-likelihood functions is 
a Busemann function of the symmetric space Pos(m), see e.g. [9]. The maximum 
likelihood estimator is then the barycenter of the related probability measure on 
the Grassman manifold, viewed as an orbit in the Tits boundary.Theorem[T]then 
follows from Proposition 6.2 of [IT] . 

5. The linear programming bound 

In order to apply the criteron of Corollary [TJ for the existence and uniqueness of 
the GE of a sample, we must first answer the following question. 

Given vector subspaces U±, . . . , U n S G(m,r) of dimension r of¥ m and integers 
di, . . . , d n > 0, on what conditions is there a vector subspace V € G(m, s) of dimen- 
sion s of¥ m such that dim(Uk H V) = dk for k = 1, . . . ,n? A necessary condition, 
using methods of algebraic geometry, is given by Proposition [2] below. 

In a second step, we look for all possibilities with < s = dim V < m and 



1 n 

- > d k < — 



k=l 



using methods of linear programming. This leads to the following. 
Theorem 3. Almost all samples of size 



n > 



r(m — r) 

in the real or complex Grassmann manifold G(m,r) have a unique GE. 

Our main tool is the Schubert calculus on the Grassmannian G(m, s). In general, 
the Schubert variety (|10j. [8]) associated to a Young diagram or partition 

A = (A l7 ... , A m _ s ) (s > Ai > A 2 > • • • > A m _ s > 0) 

with at most s rows and m — s columns and a complete flag 

= F C F 1 C ■ ■ • C F m = F m 

of vector subspaces of F m is defined as 

n x = {V e G(m, s) | dim(F m _ s+i _ Ai n7)>i,l<!<s}. 

It is an irreducible algebraic subvariety of codimcnsion |A| = Ai + • • • + A m _ s of the 
Grassmannian G(m, s) of dimension s(m — s). 
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In particular, given U G G(m,r) and an integer d such that 
max{0, r + s — m} < dk < min{r, s}, 

the set 

S d (U) — {V £ G(m, s) | dim(C/ n 7) > <i} 

is the Schubert variety Q\ associated to the rectangular Young diagram A = d k 
with k = m + d — i — s rows and d columns if we choose the flag in such a way that 
F r = U. So, 

codim Sd(U) — d(m + d — r — s). 

Proposition 2. The following property holds for almost all samples (U\, . . . , U n ) 
in the real or complex Grassmann manifold G(m, r). For any vector subspace V of 
dimension s of¥ m , 

(17) max{0, r + s — in} < dk < min{r, s} for k = 1, . . . , n, and 

n 

(18) dk(m + dk — r — s) < s(m — s), 
fe=i 

where dk = dim(J7fc (~l V). 

Remark. The conditions (fl"7)) and (| 18[) are necessary for the existence of a vector 
subspace V such that c4 = dim([/fcfW) for fc = 1, . . . , n. But they are not sufficient, 
as shown by the example m = 6, r = 3, s = 3, n = 2, d\ = di — 2. In this case, the 
inequalities (fP7]) and (jT8j) are satisfied, although there is in general no V G G(6, 3) 
meeting U\ and U2 in subspaces of dimension 2. 

To get necessary and sufficient conditions, we need the Schubert calculus. But 
computations in the Schubert calculus (Littlewood-Richardson coefficients) are al- 
gorithmically hard [16] so we must content ourselves with Proposition [2l 

Proof of Proposition^ The inequalities (|17|1 for dt = dim(J7fc H V) follow from the 
dimension formula 

dim([/fc n V) + dim(L4 + V) = dha(U k ) + dim(V). 

The proof of the rest of the proposition uses standard methods of algebraic geom- 
etry. 

Let d\ , . . . , d n be arbitrary integers satisfying the inequalities (|17p and consider 
the algebraic correspondence 

C={((Ui,...,U n ),V) G G(m,r) n x G(m,s) \ dim(C/fe n V) > d k , k = 1, . . . ,n}. 

The range of C is the whole of G(m, s), and its domain Ad lt ... y d n consists of the 
samples (Ux, ■ ■ ■ , U n ) e G(m, r) n for which there is some V G G(m, s) with dim(t//cn 
V) > dk for fc = 1, . . . , n. Let ((C/i, . . . , U n ), V) be a generic point of C. Observe 
that 

C-\V) = {(Ux, . . . , U n ) G G(m,r)" | ((C/i, . . . , C/„), V) G C) 
= S dl (V) x---xS dn (U n ), 

where Sd k (V) = {U G G(m, r) \ dxm(U R F) > dfe} is a Schubert variety with 

codim Sri,, ( V) = dk(m + dk — r — s) 
as explained above for Sd{U). 
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According to the principle of counting constants [10 , 

dim ..,</„ + dimC(Ui, ...,U n ) = dimG(™, s) + dimC _1 (l/), 

where C(U\, . . . , U n ) consists of all V e G(to, s) such that dim([4 Pi V) > dk for 
k = 1, . . . , n, hence 

&\mA dl: ... 4n < dimG(m, s) + dim C _1 (F) 

n 

— s(m — s) + (r(m — r) — dk(m + dk — r — s)) 

k=\ 

n 

= dimG(m, r) n + s(m — s) — dk(m + dk — T — s). 

fc=i 

This shows that A^ u ^ is a proper algebraic subset of G(m,r)" if the inequal- 
ity (fT8|) is not satisfied. Let N s be the union of An lt .. ^ where (d±, . . . , d n ) runs over 
all those lists of integers satisfying the inequalities (| 1 T|) but not the inequality (fT8|) . 
and let N be the union of N s for s = 0, . . . , m. As a finite union of proper alge- 
braic subsets, N is also a proper algebraic subset by the irreducibility of G(m, r) n , 
hence negligible. Now, take a sample (U±, . . . , U n ) € G(m, r) n not belonging to N, 
and any vector subspace V of any dimension s of F™\ Set dk = dim([//- n V) for 
k = 1, . . . , n, so that (J7i, . . . , [/„) G Aii,. ..,<*„ ■ Then (di, . . . , d n ) must satisfy the 
inequality (fT5| . otherwise (J7i, ...,[/„) would belong to N by the very definition 
of N. This proves the Proposition. □ 

Consider next the set B(m,r,s) of those positive integers n for which there are 
integers d\ , . . . , d n satisfying the inequalities 

(17) max{0, r + s — to} < dk < min{r, s} for k = 1, . . . , n, 

n 

(18) dk(m + dk — r — s) < s(m — s), 

k=l 

n 

(19) TO ^Z ^fe — nrS ' 

fc=l 

and set B(m, r) = U^li 1 B(m, r, s). 

Lemma 1. Almost all samples of size n ^ B(m,r) in the real or complex Grass- 
mann manifold G(m,r) have a unique GE. 

Proof. Suppose that n £ B(m,r). According to Proposition [21 the following holds 
for almost all (Ui, . . . , U n ) 6 G(m, r) n . For any proper vector subspace V of dimen- 
sion s of F m , the integers dk = dim([4 (~l V) satisfy the inequalities (fT7|) and (fT8|l . 
But they do not satisfy the inequality (fT9|) since n ^ £?(to, r) hence n ^ i?(m, r, s). 
Thus m X^fe=i dk < nrs, which is precisely the condition (J6j> of Corollary [T] for the 
sample U\, . . . , U n to have a unique GE. □ 

Lemma 2. For any integers to, r with < r < m, £/ie set B(m, r) is bounded above 
by to 2 /r(m — r). 

Proof. As B(m, r) — IJ^Lj 1 B(m, r, s), we first look for an upper bound of B(m, r, s). 
To this end, we replace the unknowns d\, . . . , d n in the the definition of B{m, r, s) 
by the number 

n% = e {1, . . . , n} | d k = i} 
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of occurences among di, ■ ■ ■ , d n of each integer i between i$ and i\, where 

i{) = max{0, r + s — ni) and i\ = min{r, s}. 

With these new unknowns rii , . . . , n^, the inequations (|17H19[) translate into the 
system of linear inequations 

(20) m > for i a <i< h, 

ii 

(21) i(m + i — r — s)rii < s(m — s), 

i=ia 
ii 

(22) ^2(rs - mi)rii < Q, 

i=i 

with n = X)iLi n i- So, B(m,r,s) consists of those integers n which decompose 
into a sum n — Y^iLi n i °f integers ni satisfying the inequalities (|20H22|) . The 
maximum of B(m, r, s) (if any) is the solution of the integer linear program 

»i 

maximize m subject to the constraints (|20H22|) . 

i=io 

Relaxing the integrality condition on rii yields a usual linear program with real 
rii , . . . ,rii 1 , whose solution is an upper bound of B(m,r,s). Standard methods 
of linear programming [22] show that the constraints (fT7l - [T9)) define a bounded 
polytope whose vertices are of one of the following two types. 
siin — s) 

o Tii — -- — for some i and — - for k ^ i. 

i(m + i — r — s) 

o rii and nj are the solutions of the system of equations 

i(m + i — r — s)m + j(m + j — r — s)nj = s(m — s), 
(rs — mi)m + (rs — mj)rij = 0. 

and rifc = for k ^ 

Now, routine computations show that the sum n = X)iLi n i reaches its maximum 
on vertices of the the first type when i = \rs / m] , and on vertices of the second 
type when i + 1 = j = \rs/m~\. It can then be checked that these maxima are 
bounded above by the quantity m 2 /r(m — r). □ 

Theorem [3] immediately follows from Lemma [1] and [21 

6. Numerical algorithms 

Let P be a probability measure admitting a unique maximum likelihood estima- 
tor. We propose here two algorithms to locate this estimator, using the geometry 
of the problem (see Section 2]). The first one is a gradient-descent dynamics. The 
second one is a faster method which avoids the time consuming steps of the first 
one. 

We look for the solution E to the equation (fTB]) . The Exponential map Exp s 
from Ts to Pos(m) is given explicitely by Exp s (w) = e"T,. Given some and £' = 
Exp Sfc (w), the idea is to approximate the gradient grad£p(E') using the parallel 
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transport of grad^p(Sfe) + Vt,(£fc). One then computes the solution Vk+i G Ts fc 
to the linear system 

(23) gradMEfe) + V„ fc+1 grad^ P (S fe ) = 0. 

The loop is closed by setting £fc+i = Exp Sfc (vk+i). 

The step which consists in solving (|23|) is time consuming, so that we propose a 
faster dynamics: Given we use the geodesic 7fc(f) = e 2 * grad ^- p ( E ' c " ) I]fc, and set 

(24) £ fc+1 = 7fc (l) = e 2 s rad M£ fc ) S]fc . 

Our simulations indicate that the sequence CSk)k>o converges toward the max- 
imum likelihood estimator S". We have performed a simulation study using n — 
50, 500, 5000 i.i.d. random samples {X 1 ),--- ,(X n ), (X 1 ) € G(4,2), distributed 
according to the Grassmannian distribution of parameter Sq given by 



1.23943 0.53234 0.21763 0.33038 

0.53234 1.12502 0.76236 0.20842 

0.21763 0.7626 1.52821 0.82655 

0.33038 0.20842 0.82655 1.52298 

The probability measure P is then the empirical distribution on G(4, 2) associ- 
ated with the random sample. Our simulations indicate that the maximum like- 
lihood is consistent. For a random sample of size n — 500, we found that the 
difference between Sq and the estimate E n is given by 



0.0282495 0.0095817 0.0791341 -0.0819841 
0.0269432 0.1031291 -0.0447055 
0.1444463 -0.0051798 
-0.1134552 

For n — 5000, this difference was given by 



0.01223629 



-0.0100086 
-0.0209799 



0.0110916 
-0.0366614 
-0.0571491 



-0.0221974 
-0.0114825 
0.0010570 
0.0380609 
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